!pip install requests pandas numpy pandas_profiling matplotlib pyecharts sqlalchemy
Looking in indexes: http://mirrors.aliyun.com/pypi/simple/
Requirement already satisfied: requests in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (2.25.1)
Requirement already satisfied: pandas in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (1.2.3)
Requirement already satisfied: numpy in /home/babyfan/.local/lib/python3.7/site-packages (1.19.5)
Requirement already satisfied: pandas_profiling in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (2.11.0)
Requirement already satisfied: matplotlib in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (3.3.4)
Requirement already satisfied: pyecharts in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (1.9.0)
Requirement already satisfied: sqlalchemy in /home/babyfan/.local/lib/python3.7/site-packages (1.4.13)
Requirement already satisfied: python-dateutil>=2.1 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from matplotlib) (2.8.1)
Requirement already satisfied: cycler>=0.10 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from matplotlib) (0.10.0)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from matplotlib) (2.4.7)
Requirement already satisfied: pillow>=6.2.0 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from matplotlib) (8.1.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from matplotlib) (1.3.1)
Requirement already satisfied: six in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from cycler>=0.10->matplotlib) (1.15.0)
Requirement already satisfied: pytz>=2017.3 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from pandas) (2021.1)
Requirement already satisfied: phik>=0.10.0 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from pandas_profiling) (0.11.2)
Requirement already satisfied: tangled-up-in-unicode>=0.0.6 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from pandas_profiling) (0.0.7)
Requirement already satisfied: scipy>=1.4.1 in /home/babyfan/.local/lib/python3.7/site-packages (from pandas_profiling) (1.5.4)
Requirement already satisfied: jinja2>=2.11.1 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from pandas_profiling) (2.11.3)
Requirement already satisfied: visions[type_image_path]==0.6.0 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from pandas_profiling) (0.6.0)
Requirement already satisfied: htmlmin>=0.1.12 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from pandas_profiling) (0.1.12)
Requirement already satisfied: missingno>=0.4.2 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from pandas_profiling) (0.4.2)
Requirement already satisfied: ipywidgets>=7.5.1 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from pandas_profiling) (7.6.3)
Requirement already satisfied: confuse>=1.0.0 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from pandas_profiling) (1.4.0)
Requirement already satisfied: attrs>=19.3.0 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from pandas_profiling) (20.3.0)
Requirement already satisfied: seaborn>=0.10.1 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from pandas_profiling) (0.11.1)
Requirement already satisfied: joblib in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from pandas_profiling) (1.0.1)
Requirement already satisfied: tqdm>=4.48.2 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from pandas_profiling) (4.60.0)
Requirement already satisfied: networkx>=2.4 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from visions[type_image_path]==0.6.0->pandas_profiling) (2.5.1)
Requirement already satisfied: imagehash in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from visions[type_image_path]==0.6.0->pandas_profiling) (4.2.0)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from requests) (1.26.3)
Requirement already satisfied: chardet<5,>=3.0.2 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from requests) (4.0.0)
Requirement already satisfied: certifi>=2017.4.17 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from requests) (2020.12.5)
Requirement already satisfied: idna<3,>=2.5 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from requests) (2.10)
Requirement already satisfied: pyyaml in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from confuse>=1.0.0->pandas_profiling) (5.4.1)
Requirement already satisfied: nbformat>=4.2.0 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from ipywidgets>=7.5.1->pandas_profiling) (5.1.2)
Requirement already satisfied: widgetsnbextension~=3.5.0 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from ipywidgets>=7.5.1->pandas_profiling) (3.5.1)
Requirement already satisfied: traitlets>=4.3.1 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from ipywidgets>=7.5.1->pandas_profiling) (5.0.5)
Requirement already satisfied: ipykernel>=4.5.1 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from ipywidgets>=7.5.1->pandas_profiling) (5.4.3)
Requirement already satisfied: ipython>=4.0.0 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from ipywidgets>=7.5.1->pandas_profiling) (7.20.0)
Requirement already satisfied: jupyterlab-widgets>=1.0.0 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from ipywidgets>=7.5.1->pandas_profiling) (1.0.0)
Requirement already satisfied: tornado>=4.2 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from ipykernel>=4.5.1->ipywidgets>=7.5.1->pandas_profiling) (6.1)
Requirement already satisfied: jupyter-client in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from ipykernel>=4.5.1->ipywidgets>=7.5.1->pandas_profiling) (6.1.11)
Requirement already satisfied: setuptools>=18.5 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from ipython>=4.0.0->ipywidgets>=7.5.1->pandas_profiling) (53.0.0)
Requirement already satisfied: decorator in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from ipython>=4.0.0->ipywidgets>=7.5.1->pandas_profiling) (4.4.2)
Requirement already satisfied: pexpect>4.3 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from ipython>=4.0.0->ipywidgets>=7.5.1->pandas_profiling) (4.8.0)
Requirement already satisfied: pickleshare in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from ipython>=4.0.0->ipywidgets>=7.5.1->pandas_profiling) (0.7.5)
Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from ipython>=4.0.0->ipywidgets>=7.5.1->pandas_profiling) (3.0.14)
Requirement already satisfied: backcall in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from ipython>=4.0.0->ipywidgets>=7.5.1->pandas_profiling) (0.2.0)
Requirement already satisfied: jedi>=0.16 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from ipython>=4.0.0->ipywidgets>=7.5.1->pandas_profiling) (0.17.2)
Requirement already satisfied: pygments in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from ipython>=4.0.0->ipywidgets>=7.5.1->pandas_profiling) (2.7.4)
Requirement already satisfied: parso<0.8.0,>=0.7.0 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from jedi>=0.16->ipython>=4.0.0->ipywidgets>=7.5.1->pandas_profiling) (0.7.1)
Requirement already satisfied: MarkupSafe>=0.23 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from jinja2>=2.11.1->pandas_profiling) (1.1.1)
Requirement already satisfied: jsonschema!=2.5.0,>=2.4 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from nbformat>=4.2.0->ipywidgets>=7.5.1->pandas_profiling) (3.2.0)
Requirement already satisfied: ipython-genutils in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from nbformat>=4.2.0->ipywidgets>=7.5.1->pandas_profiling) (0.2.0)
Requirement already satisfied: jupyter-core in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from nbformat>=4.2.0->ipywidgets>=7.5.1->pandas_profiling) (4.7.1)
Requirement already satisfied: importlib-metadata in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets>=7.5.1->pandas_profiling) (3.4.0)
Requirement already satisfied: pyrsistent>=0.14.0 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets>=7.5.1->pandas_profiling) (0.17.3)
Requirement already satisfied: ptyprocess>=0.5 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from pexpect>4.3->ipython>=4.0.0->ipywidgets>=7.5.1->pandas_profiling) (0.7.0)
Requirement already satisfied: wcwidth in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->ipython>=4.0.0->ipywidgets>=7.5.1->pandas_profiling) (0.2.5)
Requirement already satisfied: notebook>=4.4.1 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas_profiling) (6.2.0)
Requirement already satisfied: terminado>=0.8.3 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas_profiling) (0.9.2)
Requirement already satisfied: Send2Trash>=1.5.0 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas_profiling) (1.5.0)
Requirement already satisfied: pyzmq>=17 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas_profiling) (22.0.2)
Requirement already satisfied: prometheus-client in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas_profiling) (0.9.0)
Requirement already satisfied: nbconvert in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas_profiling) (6.0.7)
Requirement already satisfied: argon2-cffi in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas_profiling) (20.1.0)
Requirement already satisfied: simplejson in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from pyecharts) (3.17.2)
Requirement already satisfied: prettytable in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from pyecharts) (2.1.0)
Requirement already satisfied: greenlet!=0.4.17 in /home/babyfan/.local/lib/python3.7/site-packages (from sqlalchemy) (1.0.0)
Requirement already satisfied: cffi>=1.0.0 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from argon2-cffi->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas_profiling) (1.14.4)
Requirement already satisfied: pycparser in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from cffi>=1.0.0->argon2-cffi->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas_profiling) (2.20)
Requirement already satisfied: PyWavelets in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from imagehash->visions[type_image_path]==0.6.0->pandas_profiling) (1.1.1)
Requirement already satisfied: zipp>=0.5 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from importlib-metadata->jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets>=7.5.1->pandas_profiling) (3.4.0)
Requirement already satisfied: typing-extensions>=3.6.4 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from importlib-metadata->jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets>=7.5.1->pandas_profiling) (3.7.4.3)
Requirement already satisfied: defusedxml in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas_profiling) (0.6.0)
Requirement already satisfied: mistune<2,>=0.8.1 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas_profiling) (0.8.4)
Requirement already satisfied: jupyterlab-pygments in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas_profiling) (0.1.2)
Requirement already satisfied: pandocfilters>=1.4.1 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas_profiling) (1.4.3)
Requirement already satisfied: bleach in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas_profiling) (3.3.0)
Requirement already satisfied: entrypoints>=0.2.2 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas_profiling) (0.3)
Requirement already satisfied: nbclient<0.6.0,>=0.5.0 in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas_profiling) (0.5.1)
Requirement already satisfied: testpath in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas_profiling) (0.4.4)
Requirement already satisfied: nest-asyncio in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from nbclient<0.6.0,>=0.5.0->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas_profiling) (1.5.1)
Requirement already satisfied: async-generator in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from nbclient<0.6.0,>=0.5.0->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas_profiling) (1.10)
Requirement already satisfied: packaging in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas_profiling) (20.9)
Requirement already satisfied: webencodings in /opt/miniconda3/envs/spider/lib/python3.7/site-packages (from bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas_profiling) (0.5.1)
WARNING: You are using pip version 21.0.1; however, version 21.1.2 is available.
You should consider upgrading via the '/opt/miniconda3/envs/spider/bin/python -m pip install --upgrade pip' command.
import os
import zipfile
from datetime import datetime
import requests
import numpy as np
import pandas as pd
from pandas_profiling import ProfileReport
import matplotlib.pyplot as plt
from pyecharts.charts import Map, Bar,Line
from pyecharts.charts import Geo
from pyecharts.datasets import register_url
from pyecharts import options as opts
import sqlalchemy as sqla
def extract_vaccination():
zip_file_name = './vaccination.zip'
if not os.path.isfile(zip_file_name):
res = requests.get(
url='https://storage.googleapis.com/kaggle-data-sets/1113000/2306784/bundle/archive.zip?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=gcp-kaggle-com%40kaggle-161607.iam.gserviceaccount.com%2F20210606%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20210606T075204Z&X-Goog-Expires=259199&X-Goog-SignedHeaders=host&X-Goog-Signature=907f961742f530922a3d1ede569cf5d26ed9eeb02074297d83e11c05ff911a29e450b3f29dee5970467d0b53a8cbeb2b4c02d94594e1dcee71f6041fda1d3f827d71d5cd75dcc705ea4d07a9aa9bf61b6bd086708d041c7d74603c32951eee1aadc95874f63c8fb979bf934d82b324e11fdb60c079bdf32219f4c6c9d6c1f3694676e54a7258cf39baef3ec2c33b1abaa8f6236e012abf4c86d6c30cd39bcd8a89d2bd6f0bc48d3edbc98a53dddd6985e2832c64e32d684febec9ec5edac8f1d9889f03168f8e324579d0ca23b4504843832ee33c02f906692b41dd19ec80cf74428241e33783e3d9ea2e31d6f4095f682c6859352e77d57d06823cd71dfd997'
)
with open(zip_file_name, "wb") as f:
f.write(res.content)
azip = zipfile.ZipFile(zip_file_name)
csv_files = azip.namelist()
assert len(csv_files) == 1, 'zip file ERROR!'
csv_name = azip.extract(csv_files[0], os.getcwd())
return csv_name
else:
return './us_state_vaccinations.csv'
def extract_cases():
cases_file = './cases.csv'
if not os.path.isfile(cases_file):
res = requests.get(
url='https://data.cdc.gov/api/views/9mfq-cb36/rows.csv?accessType=DOWNLOAD'
)
with open(cases_file, "wb") as f:
f.write(res.content)
return cases_file
vacc_file = extract_vaccination()
case_file = extract_cases()
vacc_df = pd.read_csv(vacc_file)
vacc_df = vacc_df[["date","location","total_vaccinations"]]
vacc_df
| date | location | total_vaccinations | |
|---|---|---|---|
| 0 | 2021-01-12 | Alabama | 78134.0 |
| 1 | 2021-01-13 | Alabama | 84040.0 |
| 2 | 2021-01-14 | Alabama | 92300.0 |
| 3 | 2021-01-15 | Alabama | 100567.0 |
| 4 | 2021-01-16 | Alabama | NaN |
| ... | ... | ... | ... |
| 9443 | 2021-06-01 | Wyoming | 395636.0 |
| 9444 | 2021-06-02 | Wyoming | 397488.0 |
| 9445 | 2021-06-03 | Wyoming | 397654.0 |
| 9446 | 2021-06-04 | Wyoming | 399448.0 |
| 9447 | 2021-06-05 | Wyoming | 399497.0 |
9448 rows × 3 columns
case_df = pd.read_csv(case_file)
case_df = case_df.sort_values(by=["submission_date", "state"])
# date transformation
case_df['submission_date'] = case_df['submission_date'].apply(
lambda date_str: datetime.strptime(date_str, '%m/%d/%Y').__format__(
'%Y-%m-%d'))
# replace abbreviations with its full name
state_name = [
"Alabama", "AL", "Alaska", "AK", "Arizona", "AZ", "Arkansas", "AR",
"California", "CA", "Colorado", "CO", "Connecticut", "CT", "Delaware",
"DE", "Florida", "FL", "Georgia", "GA", "Hawaii", "HI", "Idaho", "ID",
"Illinois", "IL", "Indiana", "IN", "Iowa", "IA", "Kansas", "KS",
"Kentucky", "KY", "Louisiana", "LA", "Maine", "ME", "Maryland", "MD",
"Massachusetts", "MA", "Michigan", "MI", "Minnesota", "MN", "Mississippi",
"MS", "Missouri", "MO", "Montana", "MT", "Nebraska", "NE", "Nevada", "NV",
"New Hampshire", "NH", "New Jersey", "NJ", "New Mexico", "NM", "New York",
"NYC", "North Carolina", "NC", "North Dakota", "ND", "Ohio", "OH",
"Oklahoma", "OK", "Oregon", "OR", "Pennsylvania", "PA", "Rhode Island",
"RI", "South Carolina", "SC", "South Dakota", "SD", "Tennessee", "TN",
"Texas", "TX", "Utah", "UT", "Vermont", "VT", "Virginia", "VA",
"Washington", "WA", "West Virginia", "WV", "Wisconsin", "WI", "Wyoming",
"WY"
]
fullnames, abbreviations = state_name[0::2], state_name[1::2]
abb_dict = {}
for full_name, abbreviation in zip(fullnames, abbreviations):
abb_dict[abbreviation] = full_name
case_df['state'] = case_df['state'].apply(
lambda abbr: abb_dict.get(abbr, abbr))
case_df = case_df[[
"submission_date", "state", "tot_cases", "tot_death", "new_case"
]]
# rename the columns
case_df = case_df.rename(columns={
'submission_date': 'date',
'state': 'location'
})
case_df
| date | location | tot_cases | tot_death | new_case | |
|---|---|---|---|---|---|
| 5599 | 2021-01-01 | Alaska | 46241 | 285 | 470 |
| 23384 | 2021-01-01 | Alabama | 390699 | 7217 | 2295 |
| 4893 | 2021-01-01 | Arkansas | 229442 | 3711 | 4304 |
| 6004 | 2021-01-01 | AS | 3 | 0 | 0 |
| 10360 | 2021-01-01 | Arizona | 530622 | 9015 | 10063 |
| ... | ... | ... | ... | ... | ... |
| 7879 | 2020-12-31 | Vermont | 7412 | 136 | 136 |
| 9274 | 2020-12-31 | Washington | 246413 | 3461 | 3958 |
| 21820 | 2020-12-31 | Wisconsin | 520438 | 5242 | 4212 |
| 1299 | 2020-12-31 | West Virginia | 85334 | 1338 | 1109 |
| 11509 | 2020-12-31 | Wyoming | 44409 | 438 | 276 |
30000 rows × 5 columns
# merge the two dataset and replace nan with 0
df = pd.merge(vacc_df,
case_df,
how="right",
left_on=["date", "location"],
right_on=["date", "location"])
df = df.sort_values(by=["date", "location"])
df.replace(np.nan, 0, inplace=True, regex=True)
df['total_vaccinations'] = df['total_vaccinations'].astype('int')
df
# df.set_index("date")
| date | location | total_vaccinations | tot_cases | tot_death | new_case | |
|---|---|---|---|---|---|---|
| 1263 | 2020-01-22 | AS | 0 | 0 | 0 | 0 |
| 1261 | 2020-01-22 | Alabama | 0 | 38 | 0 | 38 |
| 1260 | 2020-01-22 | Alaska | 0 | 0 | 0 | 0 |
| 1264 | 2020-01-22 | Arizona | 0 | 0 | 0 | 0 |
| 1262 | 2020-01-22 | Arkansas | 0 | 0 | 0 | 0 |
| ... | ... | ... | ... | ... | ... | ... |
| 17393 | 2021-06-04 | Virginia | 8446018 | 676586 | 11229 | 286 |
| 17396 | 2021-06-04 | Washington | 7755360 | 439263 | 5836 | 719 |
| 17398 | 2021-06-04 | West Virginia | 1325601 | 162232 | 2813 | 121 |
| 17397 | 2021-06-04 | Wisconsin | 5517836 | 675150 | 7954 | 211 |
| 17399 | 2021-06-04 | Wyoming | 399448 | 60628 | 720 | 79 |
30000 rows × 6 columns
# storage the dataset to sqlite
TABLE = 'DATE'
engine = sqla.create_engine('sqlite:///covid.sqlite')
df.to_sql(name=TABLE, con=engine, if_exists='replace', index=False)
# read the data in chunks
data = pd.read_sql_query(f"SELECT * FROM {TABLE} WHERE date='2021-06-04' ", con=engine)
data
| date | location | total_vaccinations | tot_cases | tot_death | new_case | |
|---|---|---|---|---|---|---|
| 0 | 2021-06-04 | AS | 0 | 0 | 0 | 0 |
| 1 | 2021-06-04 | Alabama | 3032765 | 546144 | 11215 | 562 |
| 2 | 2021-06-04 | Alaska | 616473 | 67648 | 362 | 57 |
| 3 | 2021-06-04 | Arizona | 6015930 | 883037 | 17673 | 346 |
| 4 | 2021-06-04 | Arkansas | 2095644 | 342142 | 5845 | 253 |
| 5 | 2021-06-04 | California | 38825789 | 3687736 | 62179 | 1047 |
| 6 | 2021-06-04 | Colorado | 5688458 | 545769 | 6611 | 767 |
| 7 | 2021-06-04 | Connecticut | 4071439 | 347891 | 8246 | 143 |
| 8 | 2021-06-04 | DC | 0 | 49068 | 1136 | 27 |
| 9 | 2021-06-04 | Delaware | 951447 | 109005 | 1674 | 48 |
| 10 | 2021-06-04 | FSM | 0 | 1 | 0 | 0 |
| 11 | 2021-06-04 | Florida | 18982387 | 2286332 | 36973 | 0 |
| 12 | 2021-06-04 | GU | 0 | 8204 | 139 | 0 |
| 13 | 2021-06-04 | Georgia | 7783684 | 1125512 | 20942 | 495 |
| 14 | 2021-06-04 | Hawaii | 1595765 | 35217 | 498 | 65 |
| 15 | 2021-06-04 | Idaho | 1246785 | 192795 | 2101 | 165 |
| 16 | 2021-06-04 | Illinois | 11786638 | 1384365 | 25282 | 626 |
| 17 | 2021-06-04 | Indiana | 5159399 | 746135 | 13661 | 445 |
| 18 | 2021-06-04 | Iowa | 2867134 | 371863 | 6067 | 102 |
| 19 | 2021-06-04 | Kansas | 2343100 | 314855 | 5084 | 166 |
| 20 | 2021-06-04 | Kentucky | 3680332 | 460049 | 7086 | 509 |
| 21 | 2021-06-04 | Louisiana | 3112365 | 472981 | 10615 | 364 |
| 22 | 2021-06-04 | MP | 0 | 183 | 2 | 0 |
| 23 | 2021-06-04 | Maine | 1503653 | 68154 | 839 | 97 |
| 24 | 2021-06-04 | Maryland | 6212322 | 460472 | 9639 | 133 |
| 25 | 2021-06-04 | Massachusetts | 8146584 | 707723 | 17903 | 200 |
| 26 | 2021-06-04 | Michigan | 8773775 | 994506 | 20511 | 554 |
| 27 | 2021-06-04 | Minnesota | 5461633 | 602134 | 7538 | 253 |
| 28 | 2021-06-04 | Mississippi | 1790337 | 318189 | 7325 | 0 |
| 29 | 2021-06-04 | Missouri | 4713270 | 602019 | 9140 | 516 |
| 30 | 2021-06-04 | Montana | 869261 | 112260 | 1632 | 131 |
| 31 | 2021-06-04 | NY | 0 | 1152423 | 19808 | 429 |
| 32 | 2021-06-04 | Nebraska | 1715953 | 223558 | 2249 | 41 |
| 33 | 2021-06-04 | Nevada | 2535207 | 325392 | 5600 | 361 |
| 34 | 2021-06-04 | New Hampshire | 1413342 | 98877 | 1355 | 37 |
| 35 | 2021-06-04 | New Jersey | 9237904 | 1017337 | 26265 | 288 |
| 36 | 2021-06-04 | New Mexico | 2167318 | 203472 | 4280 | 142 |
| 37 | 2021-06-04 | New York | 0 | 950643 | 33286 | 328 |
| 38 | 2021-06-04 | North Carolina | 8208717 | 1004669 | 13151 | 680 |
| 39 | 2021-06-04 | North Dakota | 617621 | 110178 | 1516 | 27 |
| 40 | 2021-06-04 | Ohio | 9984951 | 1104001 | 19980 | 621 |
| 41 | 2021-06-04 | Oklahoma | 2975700 | 453662 | 7320 | 110 |
| 42 | 2021-06-04 | Oregon | 4123145 | 202675 | 2686 | 428 |
| 43 | 2021-06-04 | PR | 0 | 138949 | 2516 | 76 |
| 44 | 2021-06-04 | PW | 0 | 1 | 0 | 0 |
| 45 | 2021-06-04 | Pennsylvania | 12805618 | 1204802 | 27325 | 703 |
| 46 | 2021-06-04 | RMI | 0 | 4 | 0 | 0 |
| 47 | 2021-06-04 | Rhode Island | 1174718 | 151992 | 2717 | 56 |
| 48 | 2021-06-04 | South Carolina | 3825550 | 593975 | 9754 | 238 |
| 49 | 2021-06-04 | South Dakota | 780255 | 124261 | 2022 | 19 |
| 50 | 2021-06-04 | Tennessee | 4888109 | 863840 | 12476 | 220 |
| 51 | 2021-06-04 | Texas | 23293586 | 2950622 | 50590 | 1929 |
| 52 | 2021-06-04 | Utah | 2591362 | 407113 | 2311 | 288 |
| 53 | 2021-06-04 | VI | 0 | 3560 | 28 | 0 |
| 54 | 2021-06-04 | Vermont | 783392 | 24252 | 256 | 12 |
| 55 | 2021-06-04 | Virginia | 8446018 | 676586 | 11229 | 286 |
| 56 | 2021-06-04 | Washington | 7755360 | 439263 | 5836 | 719 |
| 57 | 2021-06-04 | West Virginia | 1325601 | 162232 | 2813 | 121 |
| 58 | 2021-06-04 | Wisconsin | 5517836 | 675150 | 7954 | 211 |
| 59 | 2021-06-04 | Wyoming | 399448 | 60628 | 720 | 79 |
# data = data[data["new_case"] >= 0]
data = df
data = data.sort_values(by=["date", "location"])
# Understand the correlation between data fields
data.corr(method='spearman')
| total_vaccinations | tot_cases | tot_death | new_case | |
|---|---|---|---|---|
| total_vaccinations | 1.000000 | 0.573292 | 0.499132 | 0.233277 |
| tot_cases | 0.573292 | 1.000000 | 0.970880 | 0.818912 |
| tot_death | 0.499132 | 0.970880 | 1.000000 | 0.807957 |
| new_case | 0.233277 | 0.818912 | 0.807957 | 1.000000 |
data.describe()
| total_vaccinations | tot_cases | tot_death | new_case | |
|---|---|---|---|---|
| count | 3.000000e+04 | 3.000000e+04 | 30000.000000 | 30000.000000 |
| mean | 6.272857e+05 | 2.096239e+05 | 4321.446533 | 1103.413600 |
| std | 2.309074e+06 | 4.173634e+05 | 7824.500752 | 2584.025928 |
| min | 0.000000e+00 | 0.000000e+00 | 0.000000 | -33355.000000 |
| 25% | 0.000000e+00 | 1.601750e+03 | 30.000000 | 14.000000 |
| 50% | 0.000000e+00 | 4.865250e+04 | 1022.000000 | 298.000000 |
| 75% | 0.000000e+00 | 2.301368e+05 | 5368.000000 | 1104.000000 |
| max | 3.882579e+07 | 3.687736e+06 | 62179.000000 | 60315.000000 |
# export report for EDA
report = ProfileReport(data)
REPORT_FILE = './report.html'
if not os.path.isfile(REPORT_FILE):
report.to_file(REPORT_FILE)
# report
data_group = data.groupby("date", as_index=False).sum()
data_group = data.groupby("date", as_index=False).sum()
data_group.plot(
x='date',
y=[
'total_vaccinations',
# 'tot_cases',
'new_case',
],
secondary_y=[
# 'tot_cases',
'new_case',
],
title='Indicators',
figsize=(20, 6),
)
<AxesSubplot:title={'center':'Indicators'}, xlabel='date'>
data_group = data.groupby("date", as_index=False).sum()
data_group.plot(
'total_vaccinations',
'new_case',
kind='scatter',
title='new cases V.s. total vaccinations',
figsize=(18, 6),
)
<AxesSubplot:title={'center':'new cases V.s. total vaccinations'}, xlabel='total_vaccinations', ylabel='new_case'>
From the two plots we can know that when the vaccination is started, the number of new infections begins to decline, which is in line with normal expectations, that is, the vaccine has a certain effect on COVID-19.
data_group_state = data.groupby(["date", "location"], as_index=False).sum()
final_date = data_group_state[data_group_state['date'] ==
data_group_state['date'].max()].sort_values(
by=['total_vaccinations'], ascending=False)
final_date.plot(
x='location',
y=[
'total_vaccinations',
'tot_cases',
],
secondary_y=[
'tot_cases',
],
kind='bar',
title='2021-06-04 Total vaccination and Total cases of each State',
figsize=(25, 8),
)
<AxesSubplot:title={'center':'2021-06-04 Total vaccination and Total cases of each State'}, xlabel='location'>
As can be seen in the chart above, California had the highest number of infected and vaccinated people on the last day of data collection.
bar = Bar()
x = list(final_date.location.values)
bar.add_xaxis(x)
new_case = [int(i) for i in final_date.new_case.values]
bar.add_yaxis('new_case', new_case)
tot_cases = [int(i) for i in final_date.tot_cases.values]
bar.add_yaxis('tot_cases', tot_cases)
bar.set_global_opts(title_opts=opts.TitleOpts(title="test"))
bar.reversal_axis()
bar.render_notebook()
from pyecharts import options as opts
from pyecharts.charts import Bar3D
from pyecharts.faker import Faker
date = list(data_group_state.date.values)
location = list(data_group_state.location.values)
new_case = [int(i) for i in data_group_state.new_case.values]
bar = Bar3D(init_opts=opts.InitOpts(width="1500px", height="1000px")).add(
"",
[[d[1], d[0], d[2]] for d in zip(date, location, new_case)],
xaxis3d_opts=opts.Axis3DOpts(type_="category"),
yaxis3d_opts=opts.Axis3DOpts(type_="category"),
zaxis3d_opts=opts.Axis3DOpts(type_="value"),
).set_global_opts(
visualmap_opts=opts.VisualMapOpts(max_=200),
title_opts=opts.TitleOpts(title="new_cases"),
)
bar.render_notebook()
The chart shows the development of the epidemic situation from three dimensions of time, location and number of new infections. It can be seen that no matter in any state, the number of new infections was better controlled on the last day of statistics than before.